Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCS_MAVLink: respect txbuf flow control for FTP messages #26098

Merged
merged 3 commits into from
Apr 15, 2024

Conversation

brad112358
Copy link
Contributor

This gives slow radio links a fighting chance of getting FTP bulk download working efficiently, even when they use a baud rate which is much higher than their current bandwidth. The strategy is similar to what Ardupilot already does for PARAM_VALUE messages.

This should eliminate the need to disable FTP for parameter download on slow to moderate speed radio links like mLRS. It allows removal of a hack in mLRS which results in a decrease in parameter download time for 19 Hz mLRS from 45-60 seconds to 11-17 seconds. Other mLRS message rates show similar improvement. This should also be good news for the ELRS rc-mavlink branch.

Also, improve get_last_txbuf() to allow message flow in the unlikely event that we stop receiving radio_status messages.

This has also been tested with TBS Crossfire and SiK radios and found to have no negative impact on parameter download.

libraries/GCS_MAVLink/GCS_Common.cpp Outdated Show resolved Hide resolved
libraries/GCS_MAVLink/GCS_FTP.cpp Outdated Show resolved Hide resolved
@olliw42
Copy link
Contributor

olliw42 commented Jan 30, 2024

I wonder if it's a good idea to put all mftp replies under txbuf control, instead of only those of the read/readburst stream. Putting also the mftp replies for "negotiating" under control may make it harder for gcses to get their timeouts right.

libraries/GCS_MAVLink/GCS.h Outdated Show resolved Hide resolved
@brad112358
Copy link
Contributor Author

I wonder if it's a good idea to put all mftp replies under txbuf control, instead of only those of the read/readburst stream. Putting also the mftp replies for "negotiating" under control may make it harder for gcses to get their timeouts right.

@olliw42 Did you happen to test this? (I ask just so we can know what you saw if you did.) Either way, I think I understand why you might wonder. It does sometimes (less than 10% in my recent tests) happen that we wait so long for buffer space that the GCS times out and ends up re-sending the first FTP message (OpenFileRO in the case of QGC).

I have some thoughts about this.

  1. I'd ideally like to make this work reasonably well, at least for mLRS without requiring the GCSes to modify their current timeouts, though I do think they could be improved in this area. In the case of QGC, the timeout for all expected FTP messages is 1 second which does seem a little low for low bandwidth links, especially given the bursts of traffic Ardupilot tends to send. (PX4 does better at spreading out normal telemetry streams)

  2. If we send the response to OpenFileRO without regard to reported buffer space, there is a real risk it or other messages might be lost. Especially if the radio involved doesn't work as hard as mLRS does to avoid this.

  3. If we send the response to OpenFileRO when buffer space is low, even if the response is not lost because of lack of space, we have just deferred the timeout problem. The GCS is just as likely to time out while we wait for space when we try to send the first response to the BurstReadFile message which will arrive very soon after we send the response to OpenFileRO.

Instead, I think it might try limiting the delay due to txbuf to about 500ms for all message types. If we don't have more than .5 seconds worth of data buffered, that might help. This would, however never work with a radio bandwidth of less than about 300 bytes/second. Note that mLRS (which currently has at least 1500 Bytes/sec). Of course, the current GCS timeouts would also need to change for lower bandwidth and higher latency.

@brad112358
Copy link
Contributor Author

I've made several improvements in the latest commit and I think I've addressed all the comments. This is working even better in my tests.

@wvarty
Copy link

wvarty commented Mar 31, 2024

Tested with:

  • ExpressLRS 2.4G hardware running the mavlink-rc branch, with MavFTP re-enabled
  • Speedybee F405 wing FC flashed with this PR
  • 250Hz packet rate, and also 333Hz-full rate

Params load substantially faster than with individual param polling (as expected) on both rates.
Mission load using both MavFTP and regular.

Flashed the FC back to mainline and confirmed that the original issues still exist: FTP param load fails (due to overloading the buffer in ELRS) and it falls back to polled param mode.

Whats left to get this merged?

@timtuxworth
Copy link
Contributor

timtuxworth commented Apr 1, 2024

Looking at this slightly differently - I wonder why the GCS needs to get all the parameters up front in order for the vehicle to fly. I'm thinking this problem might have evolved organically. Perhaps getting all parameters up front was a much smaller ask a few years ago, but as more and more parameters are added, we just tack them into the package of what gets downloaded.

I notice this most because I typically use MP on the bench and QGC when I'm flying. When using QGC I operate with far fewer parameters that I care about while at the field, whereas on the bench I definitely need a lot more of them, often because MP needs them, but I"m pretty much always connected by USB when I use Mission Planner.

Would it be possible to narrow this down to an initial download of a much smaller package of "essential" parameters required for flying while connected over ELRS?

[I recognize this comment/conversation probably belongs somewhere else, but I don't know where that might be].

@wvarty
Copy link

wvarty commented Apr 1, 2024

I've pushed up ExpressLRS/ExpressLRS#2626 on the ELRS side, for when this merges.

Copy link
Member

@IamPete1 IamPete1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a rebase, its conflicting with something that has been merged recently.

libraries/GCS_MAVLink/GCS_FTP.cpp Show resolved Hide resolved
@@ -155,9 +146,25 @@ void GCS_MAVLINK::ftp_error(struct pending_ftp &response, FTP_ERROR error) {
// send our response back out to the system
void GCS_MAVLINK::ftp_push_replies(pending_ftp &reply)
{
ftp.last_send_ms = AP_HAL::millis(); // Used to detect active FTP session
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we want to detect the active FTP session as soon as possible in get_reschedule_interval_ms(), even if we are blocked from sending the first response because of txbuf flow control. This slows down normal streams sooner and allows the radio's buffer to drain more quickly and reduces the delay the GCS sees, which helps avoid GCS timout on the first FTP response.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that makes sense. But its there now a risk of of FTP timing out if we sit here waiting to send for a while. Looks like the threshold is 3 seconds? Maybe we need a better way to check if there is FTP stuff happening than just looking at last_send_ms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately, since messages can be lost and the GCS may give up on FTP, testing for FTP activity will need to involve a timeout and I think this may be the simplest way. Where is the timeout 3 seconds? QGC seems to use 1 second timeouts.

bool GCS_MAVLINK::last_txbuf_is_greater(uint8_t txbuf_limit)
{
if (AP_HAL::millis() - last_radio_status.received_ms > 5000) {
// stale report
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// stale report
// stale report, assume 100% buffer space

@@ -759,6 +759,7 @@ class GCS_MAVLINK
uint32_t remrssi_ms;
uint8_t rssi;
uint32_t received_ms; // time RADIO_STATUS received
uint8_t txbuf = 100;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new timeout we don't need to init to 100 anymore.

Suggested change
uint8_t txbuf = 100;
uint8_t txbuf;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add the check for 0 ms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems more efficient to initialize just once than to execute the extra check every time.

@@ -827,6 +827,15 @@ float GCS_MAVLINK::telemetry_radio_rssi()
return last_radio_status.rssi/254.0f;
}

bool GCS_MAVLINK::last_txbuf_is_greater(uint8_t txbuf_limit)
{
if (AP_HAL::millis() - last_radio_status.received_ms > 5000) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (AP_HAL::millis() - last_radio_status.received_ms > 5000) {
if ((last_radio_status.received_ms == 0) || ((AP_HAL::millis() - last_radio_status.received_ms) > 5000)) {

Check for zero to make sure we do the correct thing if never received.

Copy link
Contributor Author

@brad112358 brad112358 Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a comment pointing out that the initialization of txbuf accomplishes this be OK instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, comment is OK too. Really the key thing is to be able to understand what is going on without having to go look in the .h.

Copy link
Member

@IamPete1 IamPete1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needs a squash to remove the merge commit.

@@ -75,7 +75,7 @@ GCS_MAVLINK::queued_param_send()
}
count -= async_replies_sent_count;

while (count && _queued_parameter != nullptr && get_last_txbuf() > 50) {
while (count && _queued_parameter != nullptr && last_txbuf_is_greater(33)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a static constexpr for the magic number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would we call it? one_third? Is that really better? Anything less obvious and I would be annoyed when reading the code to have to look up the constant. I considered eliminating the parameter and hard code 33 in last_txbuf_is_greater(), but I wanted to allow for other threshold values, even though I doubt we will need them.

But I'll make the change if you let me know what I should call it.

@tridge
Copy link
Contributor

tridge commented Apr 1, 2024

@brad112358 thanks! needs a rebase and squash. If you don't know how to do a git squash then ask on the code_review channel and I can do it

@IamPete1
Copy link
Member

IamPete1 commented Apr 2, 2024

A tlog of a test on real hardware would be useful. With my SIM (#26663) I'm seeing the FTP replies carry on for a long time after mission planner says the params have been fetched (like 30 seconds). It could be a mission planner issue of course, or we might need some fixes on top of this. I would guess mission planner gets impatient and requests reads multiple times resulting lots of duplicate requests.

@timtuxworth
Copy link
Contributor

It could be a mission planner issue of course, or we might need some fixes on top of this. I would guess mission planner gets impatient and requests reads multiple times resulting lots of duplicate requests.

It might be useful to test with QGC Peter. I've noticed that it behaves differently (sometimes better) and having that different behavior might give different/additional data to help figure out what is going on.

@brad112358
Copy link
Contributor Author

@brad112358 thanks! needs a rebase and squash. If you don't know how to do a git squash then ask on the code_review channel and I can do it

Thanks. I will learn how to do that and push again after recent review comments are resolved.

@brad112358
Copy link
Contributor Author

@IamPete1 and @peterbarker If you could respond to my comments on your suggested changes, then I can clean this up.
Thanks!

@@ -1142,7 +1151,7 @@ uint16_t GCS_MAVLINK::get_reschedule_interval_ms(const deferred_message_bucket_t
// we are sending requests for waypoints, penalize streams:
interval_ms *= 4;
}
if (AP_HAL::millis() - ftp.last_send_ms < 500) {
if (AP_HAL::millis() - ftp.last_send_ms < 1000) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems odd that we would use 1 second here but 3 seconds to time out a FTP session.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only looked at QGroundControl which has a 1 second ack/nak timeout with 3 retries. Does Mission Planner use a 3 second timeout?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of this timeout:

#define FTP_SESSION_TIMEOUT 3000

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(our responses crossed here) OK. I think things should work OK with this changed to 3 seconds. The stream rates will just take 3 seconds to go back to normal instead of 1. Should we move the define to a header file and use FTP_SESSION_TIMEOUT instead of 500 or 1000 in GCS_Common.cpp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after more consideration, I think there is little reason that this timeout should be the same as the session timeout. This is essentially a heuristic and should be based on the expected behavior of the GCS, including radio latency, but doesn't need to match the session timeout which really needs to be conservative (long) as discarding the open state too early would be bad. With the current GCS behavior, I think 1 second is fine.
On the other hand, it would be easier for slower radios to make this work well if the GCS FTP retry timeout was a bit longer, say 2 seconds. Bumping this from 500 to 1000 was enough for mLRS which was already working hard to keep buffered data to a minimum. But, ELRS is even slower than mLRS. Should we increase this to 2 seconds in anticipation that some work may also be required on the GCS side to help out ELRS?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is that if FTP thinks the session is still active then we should still lower its data rates. However a quick test shows the MAVProxy terminates the session after param download whereas QGC and MP do not. If we get the GCS to terminate correctly there should be no problem in using the longer time out here.

@IamPete1
Copy link
Member

IamPete1 commented Apr 5, 2024

@IamPete1 and @peterbarker If you could respond to my comments on your suggested changes, then I can clean this up. Thanks!

Any chance of that tlog? As I say even with this I didn't get good results (although still better than without), have you been testing with Mission planner or another GCS?

@brad112358
Copy link
Contributor Author

Any chance of that tlog? As I say even with this I didn't get good results (although still better than without), have you been testing with Mission planner or another GCS?

I don't use ELRS so I've only tested with mLRS and briefly with SiK and Crossfire.
I've been using this change in the Copter-4.4 branch with QGroundControl and mLRS when the weather permits flying for a couple of months. I've also tested with the master branch of Ardupilot on Mission Planner, QGC, and if I remember correctly, briefly with MAVProxy.

I'm traveling for the next few days, but I can capture tlogs with mLRS when I get back if I don't get to it sooner.

Perhaps I can also test with ELRS as I do have hardware it supports, but I expect that ELRS might require some work to get it working perfectly with FTP, even with this PR.

@IamPete1
Copy link
Member

IamPete1 commented Apr 6, 2024

I don't use ELRS so I've only tested with mLRS and briefly with SiK and Crossfire.

Ah, OK, if you can give me a rough data rate for mLRS I can set that in the simulator and see how it gets on. I will also try out QGC.

Sorry for being slow, i'm keen to get #26532 so we can see the tx and rx data rates which should make testing easyer.

@brad112358
Copy link
Contributor Author

brad112358 commented Apr 6, 2024

The nominal down link data rates for mLRS are 1558, 2542, 4100, and 9102 bytes/sec. Up link is slightly slower to allow separate RC data. Data is sent in 82 byte chunks at 19, 31, 50, or 111 Hz. There is also a mode which can compress data, but it doesn't make a big improvement and I haven't been using it. My tests are all performed at the 19 Hz, 1558 bytes/sec rate, because I only have Frsky R9 hardware for mLRS and it is limited to this rate. Other mLRS users have tested this PR for me with the other rates. At the 1558 B/s rate, this PR improved parameter download time from about 50 seconds to about 6 seconds because FTP was able to be used with no lost messages and no retrys. The higher rates saw less dramatic improvements.

Note, however, that in mLRS, Radio Status messages are not always sent at a fixed rate and the values for txbuf do not represent actual percentage, but are fudged for Ardupilot or PX4 with the goal of using between 75 and 85 percent of the available data bandwidth. This results in very low average buffer use in steady state conditions even though we allocate a fairly large, 2K byte buffer to allow clumps of messages. Usually the buffer is empty by the time the next group of messages arrives. This is quite different than a simple direct calculation of txbuf which seeks to keep the buffer utilization at around 50% full and results in higher latency than what is accomplished in mLRS, especially for very slow links.

I thought I had heard that ELRS was using the same strategy as mLRS, but I see that your emulation is simply sending txbuf based on buffer used at a higher fixed rate so I'm not sure if ELRS discarded the ideas from mLRS at some point or if they have always used a fixed rate with actual buffer percentages. It is possible that the trouble you are seeing in your ELRS emulation is due to high latency because too much data is being buffered. If so, increasing the timeouts in the GCS can help, but ELRS may also need to make changes to reduce buffered data related latency.

You can find the method mLRS uses here.

@IamPete1
Copy link
Member

IamPete1 commented Apr 11, 2024

Rebased, fixed the conflict, removed the merge commit and squashed the comment typo fixup commit.

@IamPete1
Copy link
Member

Some testing with my sim set to a limit of 1500B/s.

Without this patch:
image
TxBuf is a 0 for the duration of the FTP. Average data rate is hard up against 1500. FTP takes 16 seconds.

With patch:
image

TxBus no longer hard on zero, average data rate is the same 1500. FTP takes 12 second.

In my testing FTP succeeded both times. When I tested before with a 500B/s data rate limit it failed both times. But there is probably somewhere in between where this could make the difference between FTP working or not and result in a much more significant improvement.

I think this is a step in the right direction, but I suspect we will need to do more to get the most from the more limited links.

@IamPete1 IamPete1 dismissed their stale review April 11, 2024 00:35

Merge commit removed.

@brad112358
Copy link
Contributor Author

brad112358 commented Apr 11, 2024

To restate my goal, this PR is just providing radios with a way to make flow control work for FTP. I think anything more is impractical given the limits of the information contained in the existing RADIO_STATUS message.

I never expected that this change would accomplish optimal FTP performance with most existing radio strategies for sending txbuf updates. It just provides an opportunity (I used the phrase fighting chance for a reason) for even slow radios to make it work efficiently with some effort on the part of the radio developers. There are a few possible approaches I have experimented with on the radio side which have been successful. In the best case where the txbuf strategy on the radio side has already been designed to work well with existing Ardupilot code for non-FTP based bulk parameter download, FTP probably also works much better with this PR because it makes FTP respect txbuf the same way bulk parameter download currently does. But, mLRS is the only radio I know of that has been completely successful in developing an effective strategy on the radio side. Crossfire had some test releases that worked well for traditional bulk parameter download a couple of years ago, but they seemed to have dropped the effort when I stopped pestering them about it. Radios like SiK which send radio status only at a fixed 1 Hz rate have no hope of efficiently supporting lumpy traffic like FTP even with this PR. Until more radios can be improved to take advantage of the opportunity this PR is attempting to provide, we are stuck trying to match the serial baud rate with the radio link speed and hope the 1/3 utilization limit is effective.

But mLRS does work really well with this change and I think ELRS MAVLink mode can also easily be made to work quite well with this PR since it is still in development. For the rest of the existing radios, my short term goal is only not to make things worse. In the long term, they can be made more efficient as well if anyone cares enough to try.

What is the simulated link baud rate of your test? When the baud rate is low enough compared to the radio throughput, FTP will probably work, though, not optimally because of the artificial 1/3 link speed rate limiting which Ardupilot now does even without txbuf flow control. But, I normally use 230K baud for MAVLink with mLRS to minimize transmission based message latency (mLRS accumulates and parses every message at both ends of the link before starting to send it over the radio link, so a slow baud rate impacts latency even more than it might otherwise). Last I checked, ELRS was using a similar high serial baud rate, probably for the same reasons. At these higher baud rates, FTP will fail completely without some form of flow control. Also, what is the radio buffer size in your tests? I assume you are not emulating what mLRS does, but just reporting raw txbuf percentages. How often are you sending radio status? At higher baud rates you need to have a huge buffer (bad) or send status very frequently if sending at a fixed rate or (better) send an extra radio status report whenever the buffer nears full or empty to stop or start the flow immediately. If the radio uses a good strategy for when to send radio status, you should be able achieve near full radio link utilization during FTP transfers without message loss with a reasonably small buffer. But you need smarts in the radio in addition to something like this PR on the FC side.

@brad112358
Copy link
Contributor Author

brad112358 commented Apr 12, 2024

I have synced my local repository with the latest changes here (and from master), configured and built for one of my 19 Hz (1558 Bps) mLRS builds. I have re-run a few quick sanity tests and uploaded tlogs covering the initial connection and parameter download for QGC (Android), MP (Windows), and MAVProxy (Linux) here. I have not analyzed these logs, but the parameter upload appeared to succeed quickly as expected in all 3 tests.

Sorry it took so long for me to get to this.

Please let me know what you gleam from these logs.

Thanks!

@brad112358
Copy link
Contributor Author

In all these years, I've never bothered with SITL, but I guess I should figure out how to set it up so I can see your radio simulator work...

@brad112358
Copy link
Contributor Author

Sorry, I see I forgot to enable general access to that google drive folder. I've fixed that now.

@IamPete1
Copy link
Member

The QGC log looks more or less as I would have expected.

Looks like the mission planner on FTP failed and it grabbed params the old way.

The MAVProxy one looks OK, it does FTP, but then we have a mission planner instance appear and grab param the old way.

@brad112358
Copy link
Contributor Author

brad112358 commented Apr 13, 2024

What are you using to examine the tlog files?

Looks like the mission planner on FTP failed and it grabbed params the old way.

I'll try it again. I didn't time it, but it seemed to finish faster than I thought it would with bulk parameter transfer the old way (normally takes 5-8 times as long as FTP takes)

The MAVProxy one looks OK, it does FTP, but then we have a mission planner instance appear and grab param the old way.

That is a bit of a mystery. How do you know it was mission planner? I used a serial to WiFi UDP bridge in AP mode connected to the Tx module for this test and MAVProxy was running on a Linux system which was supposed to be the only client connected to the bridge WiFi network and there is no mission planner installed there. I did have my other notebook still booted into windows upstairs, but it was only connected to my home WiFi and not the bridge AP because it was downloading windows updates. I do sometimes run apmplanner2 on that Linux box, but I didn't think it was still running at the time either.

SITL was surprisingly trivial to get running. I created a local branch with both your radio simulator and this code but I'm not sure how to configure and enable the radio simulation and connect a second GS to it or how to make the first MAVProxy instance use the simulated radio link. Can you give me some hints?

@IamPete1
Copy link
Member

What are you using to examine the tlog files?

https://firmware.ardupilot.org/Tools/WebTools/StreamStats/

Its still very WIP, you'll probably want to turn down the window size for looking at param stuff.

The MAVProxy one looks like this:
image

We see the FTP in purple, takes a little over 10 seconds. We don't see it on the message rate plot (I think because there is only one point so there is no line) but the total plot shows a huge spike at the end of the FTP. Its too fast to actually have come over the data link I think, The top of the page shows that there were some param requests from mission planner, but they were coming from system ID 1 which is all a bit odd, maybe mission planner was connecting via UDP to the forwarding from MAVProxy or something like that:

image

We do see radio messages from sys ID 51.
image

So I guess that is the ground side radio. ELRs don't send on the ground side, so that must be something that MLRS does, although looks like we don't get buffer info, just noise stats.

Then finaly we see the GCS on 255,
image

More or less what we would expect, some heartbeats, stream rate requests, commands and the FTP requests.

This tool does not show any of the actual data in the message, just what there is and the rates, but UAV plotter can take tlogs if you want to look inside the messages. https://plot.ardupilot.org/#/

@IamPete1
Copy link
Member

SITL was surprisingly trivial to get running. I created a local branch with both your radio simulator and this code but I'm not sure how to configure and enable the radio simulation and connect a second GS to it or how to make the first MAVProxy instance use the simulated radio link. Can you give me some hints?

You can add the "radio" to any serial port with the extra command -A --serial2=sim:ELRS:tcp:3 In that example it adds it to serial 2 and then you connect on the standard serial 2 port which is 5763.

SITL needs a GCS to run, so you would typically connect a second GCS instance to the second port.

This gives slow radio links a fighting chance of getting FTP bulk download working even when they use a baud rate which is much higher than their current bandwidth.

This should eliminate the need to disable FTP for parameter download on slow to moderate speed radio links like mLRS and ELRS.  It allows removal of a hack in mLRS which results in a decrease in parameter download time for 19 Hz mLRS from 45-60 seconds to 11-17 seconds.  This should also be good news for the ELRS rc-mavlink branch.
Add check for stale radio_status to get_last_txbuf()
Move last_txbuf into last_radio_status struct
Delete unneeded orphan comment
replace get_last_txbuf() with a predicate
Make txbuf flow control threashold consistent between Parameter download and FTP and keep it in range where we are also slowing down normal streams
Delay sending text banner until after first FTP response to reduce latency on slow links
Don't let flow control delay setting ftp.last_send_ms so as to slow down normal streams as soon as possible to improve FTP response time
@IamPete1
Copy link
Member

We were having no luck getting past CI, so I have rebased to see if that will clear it.

@brad112358
Copy link
Contributor Author

Looks like the mission planner on FTP failed and it grabbed params the old way.
The MAVProxy one looks OK, it does FTP, but then we have a mission planner instance appear and grab param the old way.

I ran both the MP and MAVProxy tests again and snooped on the messages with wireshark. I didn't see any sign of what you observed above so I looked at the original tlogs again with StreamStats. In both cases the parameter messages you noted are reported in a huge burst all at the exact time that the FTP transfer completed. I haven't checked the source code to be sure, but I'm guessing these are just parameter details which MP and MAVProxy both place in the tlog and StreamStats is interpreting this data as messages. I'm guessing QGC doesn't inject the parameter values in the tlog, or perhaps it uses a different format that doesn't show up in StreamStats.

So, I think everything looks as expected with all 3 GCS.

Copy link
Member

@IamPete1 IamPete1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tridge tridge merged commit 6538e8c into ArduPilot:master Apr 15, 2024
91 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants